Evaluating the mouse neural precursor line, SN4741, as a suitable proxy for midbrain dopaminergic neurons

To overcome the ethical and technical limitations of in vivo human disease models, the broader scientific community frequently employs model organism-derived cell lines to investigate of disease mechanisms, pathways, and therapeutic strategies. Despite the widespread use of certain in vitro models, many still lack contemporary genomic analysis supporting their use as a proxy for the affected human cells and tissues. Consequently, it is imperative to determine how accurately and effectively any proposed biological surrogate may reflect the biological processes it is assumed to model. One such cellular surrogate of human disease is the established mouse neural precursor cell line, SN4741, which has been used to elucidate mechanisms of neurotoxicity in Parkinson disease for over 25 years. Here, we are using a combination of classic and contemporary genomic techniques - karyotyping, RT-qPCR, single cell RNA-seq, bulk RNA-seq, and ATAC-seq - to characterize the transcriptional landscape, chromatin landscape, and genomic architecture of this cell line, and evaluate its suitability as a proxy for midbrain dopaminergic neurons in the study of Parkinson disease. We find that SN4741 cells possess an unstable triploidy and consistently exhibits low expression of dopaminergic neuron markers across assays, even when the cell line is shifted to the non-permissive temperature that drives differentiation. The transcriptional signatures of SN4741 cells suggest that they are maintained in an undifferentiated state at the permissive temperature and differentiate into immature neurons at the non-permissive temperature; however, they may not be dopaminergic neuron precursors, as previously suggested. Additionally, the chromatin landscapes of SN4741 cells, in both the differentiated and undifferentiated states, are not concordant with the open chromatin profiles of ex vivo, mouse E15.5 forebrain- or midbrain-derived dopaminergic neurons. Overall, our data suggest that SN4741 cells may reflect early aspects of neuronal differentiation but are likely not a suitable a proxy for dopaminergic neurons as previously thought. The implications of this study extend broadly, illuminating the need for robust biological and genomic rationale underpinning the use of in vitro models of molecular processes.


Abstract
To overcome the ethical and technical limitations of in vivo human disease models, the broader scienti c community frequently employs model organism-derived cell lines to investigate of disease mechanisms, pathways, and therapeutic strategies. Despite the widespread use of certain in vitro models, many still lack contemporary genomic analysis supporting their use as a proxy for the affected human cells and tissues. Consequently, it is imperative to determine how accurately and effectively any proposed biological surrogate may re ect the biological processes it is assumed to model. One such cellular surrogate of human disease is the established mouse neural precursor cell line, SN4741, which has been used to elucidate mechanisms of neurotoxicity in Parkinson disease for over 25 years. Here, we are using a combination of classic and contemporary genomic techniques -karyotyping, RT-qPCR, single cell RNAseq, bulk RNA-seq, and ATAC-seq -to characterize the transcriptional landscape, chromatin landscape, and genomic architecture of this cell line, and evaluate its suitability as a proxy for midbrain dopaminergic neurons in the study of Parkinson disease. We nd that SN4741 cells possess an unstable triploidy and consistently exhibits low expression of dopaminergic neuron markers across assays, even when the cell line is shifted to the non-permissive temperature that drives differentiation. The transcriptional signatures of SN4741 cells suggest that they are maintained in an undifferentiated state at the permissive temperature and differentiate into immature neurons at the non-permissive temperature; however, they may not be dopaminergic neuron precursors, as previously suggested. Additionally, the chromatin landscapes of SN4741 cells, in both the differentiated and undifferentiated states, are not concordant with the open chromatin pro les of ex vivo, mouse E15.5 forebrain-or midbrain-derived dopaminergic neurons. Overall, our data suggest that SN4741 cells may re ect early aspects of neuronal differentiation but are likely not a suitable a proxy for dopaminergic neurons as previously thought. The implications of this study extend broadly, illuminating the need for robust biological and genomic rationale underpinning the use of in vitro models of molecular processes. Background in vitro cellular surrogates present an excellent opportunity for elucidating the molecular mechanisms behind human disease without the ethical and technical limitations of in vivo systems. As such, most studies of human disease that employ genomic or cellular manipulations or assays that require high cell quantity and quality, are often conducted in vitro to ensure biological and statistical robustness [1][2][3]. For example, in vitro models are frequently employed in studies of the role of genomic regulation in human disease, identi cation of candidate genes and regulatory elements and evaluation of their functional characteristics through genetic manipulations and high-throughput assays [4][5][6][7]. As genome-wide association studies (GWASs) continue to indict human disease-associated variants, it is becoming evident that most of them lie within non-coding regions of the genome [8]. Such regions frequently represent cis-regulatory elements (CREs), required for the transcriptional modulation of cognate genes.
The assays required to evaluate their function [4,9,10], or connect CREs with the promoters they modulate [11], often require large cell numbers, making, in vitro cellular systems the preferred strategy.
Prioritizing non-coding GWAS variants and disease-relevant sequences for extensive investigation requires knowledge of their chromatin accessibility status. Open chromatin is prone to harbor functional sequences; and since chromatin accessibility pro les vary across cell types and developmental time, it is important to prioritize disease-associated variants that lie within open chromatin regions in the diseaserelevant cell type(s) [8, 12,13]. It is also critical to functionally evaluate the biological consequences of disease-associated variation, test the e cacy of potential therapeutics, and observe the effects of disease-relevant insults in the appropriate cellular context[8, 14,15]. Therefore, when studying disease associated variation, the most effective in vitro cellular surrogates should ideally mimic the chromatin architecture and transcriptional pro les of the in vivo cell types affected by disease.
In Parkinson disease (PD), midbrain (MB) dopaminergic (DA) neurons in the substantia nigra (SN) are the primary affected cell type [16]. Preferential degeneration of these neurons elicits a progressive neurodegenerative disorder characterized by motor de cits [16]. As the second most common neurodegenerative disorder, affecting approximately 1% of adults over 70 years old [17,18], PD is the focus of extensive research efforts. As such, various cell lines have been used as in vitro proxies of MB DA neurons to study the cellular impacts of PD-relevant insults, as well as candidate PD-associated sequences, their functions, and their potential as therapeutic targets [19].
One such cell line, SN4741, is reported to be a clonal DA neuronal progenitor line that was established in 1999 from mouse embryonic day 13.5 (E13.5) SN tissue [20]. The SN was dissected from transgenic mice containing 9.0 kb of the 5' promoter region of tyrosine hydroxylase (TH), fused to the temperaturesensitive mutant Simian Virus 40 T antigen (SV40Tag-tsA58) oncogene [20]. The goal of this TH promoter transgene was to enable selective acquisition of DA neurons, while the purpose of the SV40Tag oncogene was to facilitate conditional immortalization of the cell line. The temperature sensitive mutant form of this immortalizing gene (tsA58) should permit uncontrolled differentiation and proliferation at the permissive temperature (33°C), maintain cells in an undifferentiated state at 37°C, and since tsA58 displays diminished activity at 39°C, it should direct differentiation that more closely resembles primary cells when the culture is shifted to this non-permissive temperature [20].
As an established mouse neural precursor line, SN4741 cells have since been used to elucidate mechanisms of neurotoxicity in PD [21][22][23][24][25], test the e cacy of therapeutic targets against PD relevant insults [26,27], and assay the impacts of PD-associated genetic mutation [28,29] and transcriptional regulation [30][31][32]. Important technological advances have also arisen since the genesis and implementation of the SN4741 cell line, including chromatin conformation capture technologies [11,33,34], RNA-sequencing (RNA-seq) [35], and assay for transposase-accessible chromatin using sequencing (ATAC-seq) [36]. In this study, we exploit these modern approaches to assess the suitability of SN4741 as an in vitro proxy for DA neurons and determine the extent to which this cell line is appropriate for prioritizing and investigating the mechanisms by which PD-associated variation confers disease risk.
Through a combination of karyotyping, single-cell (sc)RNA-seq, and RT-qPCR, we evaluate the genomic integrity of this immortalized cell line, determine how the transcriptional pro le and expression of DA neuron marker genes in this line changes between undifferentiated (37°C) and differentiated (39°C) states, and evaluate whether these transcriptional changes are consistent throughout the differentiation process. The data we collect suggests that while these cells show evidence that they are exiting a proliferative state and entering a more differentiated state, they are an unsuitable model of SN DA neurons, as they possess aneuploidy and structural abnormalities, as well as consistently low expression of DA neuron markers upon differentiation. We employ bulk RNA-seq to quantify transcriptional differences between differentiated and undifferentiated SN4741 cells and determine that, while transcriptional pro les change to re ect differentiation, they do not show strong evidence that these cells are entering a DA state. We then compare chromatin accessibility pro les of undifferentiated and differentiated SN4741 cells with those of ex vivo mouse E15.5 midbrain (MB) and forebrain (FB) neurons and determine that the chromatin accessibility pro les of SN4741 cells do not re ect the cellular population from which they were derived. Collectively, cytogenetic, chromatin, and transcriptional data suggest that the SN4741 cell line is not as strong a cellular surrogate for DA neurons as previously thought. Ultimately, this work underscores the importance of leveraging technological advances in genomic and cellular analyses to evaluate, and re-evaluate, the suitability of established model systems in disease biology.

Results
SN4741 is an Unstable Polyploid Cell Line: G-band karyotyping was performed on 20 SN4741 metaphase spreads and a representative karyogram ( Figure 1A) was generated. The karyotype was interpreted as an abnormal, polyploid, karyotype with complex numerical abnormalities and unbalanced, structural abnormalities. While most, but not all, abnormalities were consistently present in these cells; none of the 20 cells assessed had the same chromosome complement, and no normal cells were observed. All cells possessed at least one copy of each mouse autosome (1 through 19) and female sex chromosomes; however, most chromosomes were triploid in each cell ( Figure 1B). These karyotypic abnormalities already call into question the viability of these cells as a surrogate for human neurodegenerative disease. Since these cells are genetically unstable, there may be large experimental batch effects as the cell populations shift across divisions. Furthermore, gene dosage effects that severely deviate from normal copy number in DA neurons may lead to confounding and unreliable results. forkhead box A2 (Foxa2), nuclear receptor subfamily 4 group A, member 2 (Nr4a2), solute carrier family 6 member 3 (Slc6a3), and tyrosine hydroxylase (Th). Compared to the expression of these markers in the undifferentiated SN4741 cell culture (37°C), relative expression of all markers remained at similar levels when the cells were shifted to the higher temperature condition (39°C), except for an increase in Th expression ( Figure 1C). While elevated Th has been used as a marker of differentiation into DA neurons in previous work with SN4741 cells [20,37,38], an increase in Th expression is not exclusively associated with DA neurons. Th is a marker for all catecholaminergic neurons (dopaminergic and adrenergic) [39], and evidence suggests that Th expression is transient in other neurons throughout embryonic development [40][41][42]. These results indicate that at the non-permissive temperature, SN4741 cells may not be fully differentiating into DA neuron progenitors.
scRNA-seq Reveals that SN4741 Cells Differentiate at the Non-Permissive Temperature, but Lack Expression of DA Neuron Marker Genes: To assess the consistency of the differentiation protocol, transcriptomes were generated from ≥17,000 cells across four replicates cultured at the permissive temperature (37°C) and four replicates cultured at the non-permissive temperature (39°C). Analysis of the single-cell transcriptomes reveal that the cells cluster by growth temperature ( Figure 1D). This separation of cells by temperature is accompanied by changes to the cell cycle, with cells at the permissive (37°C) temperature mostly in either G2M or S phase, while cells at the non-permissive temperature (39°C) are mostly in G1 phase ( Figure 1E), indicating that they may be differentiated. In expression analysis, markers of proliferation that are expressed in G2M phase, like Marker of Proliferation Ki-67 (Mki67), are predominantly expressed in cells at the permissive temperature ( Figure 1F), corroborating the cell cycle analysis. When shifted to the non-permissive temperature, SN4741 cells appear to robustly differentiate, exempli ed by a decrease in the expression of Nestin (Nes), a neural stem cell marker ( Figure 1F). Additional transcriptional changes at this nonpermissive temperature include an increase in the expression of a neural marker CUGBP Elav-Like Family Member 5 (Celf5) [43], as well as genes that have been found to regulate neural stem cell self-renewal (Inhibitor of DNA Binding 2, Id2; High Mobility Group AT-Hook 2, Hmga2) [44,45], neurogenesis (Iroquois Homeobox 3, Irx3) [46], and arborization of neurons (Sodium Voltage-Gated Channel Beta Subunit 1, Scn1b) [47], indicating that these cells may be differentiating into neural precursor cells ( Figure 1G). Furthermore, Cadherin 13 (Cdh13), a modulator of GABAergic neurons, is signi cantly upregulated at this non-permissive temperature ( Figure 1G), while the expression of a variety of DA neuron markers fail to be detected in either the permissive or non-permissive temperatures. Markers, including Aldehyde Dehydrogenase 1 Family Member A1 (Aldh1a1), Foxa2, LIM Homeobox Transcription Factor 1 Beta (Lmx1b), Nr4a2, Paired-like homeodomain 3 (Pitx3), Slc6a3, and Th have few to no reads assigned to them ( Figure 1H). Collectively, these results suggest that while SN4741 cells are differentiating towards a neuronal fate when shifted to the nonpermissive temperature, they may not be entering a clear DA trajectory under these conditions. ATAC-seq Identi es Differential Open Chromatin Pro les in SN4741 Cells at the Permissive and Non-Permissive Temperatures: To consider how chromatin accessibility changes between the two temperatures, we performed ATAC-seq on SN4741 cells in both the undifferentiated and differentiated states. Libraries were con rmed to be technically and biologically relevant (Supplemental Figure 1), and well correlated between replicates (Supplemental Figure 2; Figure 2A-B).
A total of 83,778 consensus open chromatin regions were identi ed, with 70% of peaks shared between the two temperatures ( Figure 2C). Principal component analysis of these consensus regions suggests a clear separation in the chromatin state between the two temperatures ( Figure 2D). To explore these differences, we performed differential accessibility analysis with DiffBind[48], to nd a total of 5,055 differentially accessible regions: 2,654 enriched in the permissive temperature and 2,401 enriched at the non-permissive temperature (log 2 FC > 1, FDR < 0.05; Figure 2E).
Gene ontology of genes adjacent to differentially accessible regions largely recapitulate the scRNA-seq analysis; functions associated with regions preferentially open at the permissive temperature suggest the maintenance of the undifferentiated, cell-cycling state ( Figure 2F). The gene ontology of genes adjacent to those regions preferentially accessible at the non-permissive temperature is less coherent and suggest cell differentiation towards several fates (blood vessels, cartilage, tooth), none of which are neuronal and, perhaps unsurprisingly, demonstrate evidence of response to temperature stress [49] ( Figure 2G).
Overall, there is a shift in the chromatin accessibility between the two temperatures that indicate the cells transition from an undifferentiated to differentiated state as the cells move from the permissive to nonpermissive temperature. The differences in chromatin accessibility further con rm that SN4741 cells are not differentiating towards a neuronal lineage.
Comparison of chromatin accessibility in SN4741 cells fails to recapitulate the chromatin landscape of ex vivo mouse DA neurons: To evaluate the potential relationship between SN4741 cells and DA neurons they are presumably modelling, we compared the chromatin accessibility between the SN4741 cells at both temperatures to previously generated ex vivo mouse embryonic DA neuron chromatin accessibility pro les (NCBI GEO: GSE122450; [50]).
Considering the consensus peak set of 165,334 regions generated from all in vivo and ex vivo samples and their normalized read counts, we observe a clear separation between the SN4741 cell culture model and the ex vivo DA neurons by correlation and principal component analysis ( Figure 3A, B). Examining the raw overlap of peaks between the SN4741 cells and ex vivo neurons, just 12.5% (20,667) are present in all four cell types/conditions ( Figure 3C). The chromatin pro les are largely exclusive between the SN4741 cell culture model and the ex vivo DA neurons: 41.3% (68,304) of regions are accessible solely in the ex vivo neuron populations and 40% (65,857) are exclusively accessible in the SN4741 cell culture models. There is little overlap between the ex vivo and cultured samples. In comparison of the ex vivo midbrain DA neurons to the non-permissive, differentiated temperature, only 183 peaks are restricted to these populations.
The chromatin pro les between ex vivo embryonic DA neurons and their prospective in vitro cell culture surrogate are virtually independent. They exhibit scant overlap in their global chromatin pro les and bear little resemblance to each other at regulatory regions of key DA neuron genes ( Figure 3D). Neither the analysis of the SN4741 chromatin accessibility pro les in isolation or in comparison with ex vivo neurons would suggest these cells to be appropriate models of embryonic DA neurons.
Transcriptional Changes in SN4741 Cells Indicate Differentiation from Pluripotent Stem Cells into Brain Cells that do not Fully Resemble MB DA Neurons: Bulk RNA-seq data were also generated for SN4741 cells, at both the permissive and non-permissive temperatures, to determine whether transcriptome changes re ect differentiation into DA neurons, or other neural cell types. To evaluate the RNA-seq libraries, quality-control measures were performed in silico (Supplemental Figure 3). PCA (Supplemental Figure 3B), and sample-sample distances (Supplemental Figure 3C) rea rmed that samples cultured at the same temperature are more like oneanother than samples cultured at the alternate temperature.
We found that 735 genes were upregulated at the non-permissive temperature (adjusted p-value < 0.01 and log 2 FC > 1.5), and 954 genes were downregulated (adjusted p-value < 0.01 and log 2 FC< -1.5) at the non-permissive temperature. The list of genes signi cantly downregulated at the non-permissive temperature was submitted to Enrichr (https://maayanlab.cloud/Enrichr/) [51][52][53] for gene ontology (GO) and analysis of cell type markers. Consistent with the observation that cells at the non-permissive temperature are differentiated and in G1 phase of the cell cycle, downregulated genes resulted in GO terms strongly enriched for mitotic and DNA replication processes ( Figure 4A). Additionally, signi cantly downregulated genes at the non-permissive temperature overlap with subsets of PanglaoDB [54] cell type marker genes, suggesting that these cells are shifting away from a state that resembles neural stem cells ( Figure 4B).
Similarly, the list of signi cantly upregulated genes was submitted to Enrichr for GO and analysis of cell type markers. As expected, upregulated genes resulted in GO terms for biological processes that indicate a more terminally differentiated cell type ( Figure 4C): "synaptic vesicle docking," "negative regulation of osteoblast proliferation," "lens ber cell differentiation," "regulation of osteoblast proliferation," and "forebrain regionalization". While not included in the top 10 terms by combined score ranking, "neuron remodeling," "synaptic transmission, glutamatergic," "neuron maturation," and "synaptic transmission, cholinergic" were also identi ed as signi cantly associated terms. Notably, "synaptic transmission, dopaminergic" and "dopaminergic neuron differentiation" were also listed as insigni cant terms ( Figure  4C), as Th was the lone overlapping marker gene for these terms.
In line with GO terms enriched for biological processes involving differentiation, possibly in neuronal cells, overlapping PanglaoDB [54] cell type marker genes suggest that SN4741 cells at the non-permissive temperature most signi cantly resemble immature neurons ( Figure 4D). "Oligodendrocytes," "retinal progenitor cells," "satellite glial cells," "dopaminergic neurons," "adrenergic neurons," "GABAergic neurons," and "glutamatergic neurons" were also listed as cell types with signi cant marker gene overlap.
The distribution of various cell type marker genes on a volcano plot, indicating the log 2 FC in expression and -log 10 adjusted p-values of DE genes, reveals that the speci c genes overlapping "pluripotent stem cell" markers (26/112), cluster as the most highly signi cantly downregulated genes ( Figure 4E). In contrast, only two of the upregulated marker genes overlapping "immature neurons" (16/136, Figure 4F) and "oligodendrocytes" (17/178, Figure 4G) cluster in a similarly strong way. Plotting the 11/119 overlapping upregulated genes for "dopaminergic neurons" ( Figure 4H) reveals that 7/11 overlapping genes (Celf5, Dpys15, Cacna1b, Tmem179, Nova2, Nrx1, and Cntn2) are also marker genes for immature neurons. Plotting the DA neuron markers also assayed by RT-qPCR validates that the relative expression of these markers is consistent between these highly sensitive assays. At 39°C, Th expression increases (log 2 FC = 1.552651); Nr4a2 expression decreases (log 2 FC = -1.042794), and expression of Slc6a3 and Foxa2 is not signi cantly different between conditions: (Slc6a3 was ltered out due to low read counts across both temperature conditions and Foxa2 log 2 FC = -0.4142541).
Finally, previously generated reads per kilobase of exon per million reads mapped (RPKM) from ex vivo E15.5 mouse embryonic DA neuron bulk RNA-seq [50] were used to compare how closely the SN4741 transcriptome resembles the neuronal populations they are expected to model. Similar to our results comparing chromatin accessibility between these two datasets, correlation of RPKM shows a clear separation between the SN4741 cell culture model and the ex vivo DA neurons ( Figure 5D). Collectively, these results con rm that at the non-permissive temperature, SN4741 cells are no longer rapidly dividing, neural stem cells. However, while the transcriptional pro le of these cells indicates that they are differentiating towards cell types present in the brain, these cells do not fully possess characteristics of the MB DA neurons they are meant to model.

Discussion
It is critically important that studies of human disease generate biologically accurate data, whether aimed at elucidating molecular mechanisms, onset and progression, or management and therapeutics. In the context of discovery biology or the illumination of human health and disease mechanisms, misattribution of cellular identity, or other deviations from biological accuracy, may result the misinterpretation of biological ndings or misdirected research efforts. When studying human disease, cellular surrogates are often used to overcome the ethical and technical limitations of employing animal models. Therefore, it is imperative that disease-relevant insights are predicated on robust data generated from model systems representing human biology as accurately as possible.
Here, we demonstrate the importance of assessing in vitro models of disease to determine the extent to which they can yield biologically accurate data that can be used to inform aspects of human disease. The SN4741 cell line has been used to study neurotoxicity and therapeutic interventions [21][22][23][24][25][26][27], PDassociated genetic mutation [28,29] and cell signaling and transcriptional regulation [30][31][32]37], since it was initially characterized as an immortal, mouse MB-derived cell line that differentiates into DA neurons at a non-permissive temperature [20]. However, contemporary genomic analyses have not been leveraged to characterize and evaluate the SN4741 cell line as a suitable proxy for DA neurons in PD, until now.
We employed karyotyping, RT-qPCR, and scRNA-seq to assess the genomic stability of these cells and determine how consistently they differentiate into DA neurons at the non-permissive temperature. We generated bulk RNA-seq and ATAC-seq data from this cell line at both the permissive and non-permissive temperatures, to extensively characterize this cell line and document how transcriptional landscapes and chromatin accessibility pro les shift in response to temperature-induced differentiation and compare to known pro les of ex vivo DA neurons. Our results suggest that SN4741 is an unstable, polyploid cell line that is unlikely to be a viable differentiation model of DA neurons; and thus, is likely not a robust proxy by which to study MB DA neurons in the context of human phenotypes, including PD, schizophrenia, addiction, memory, or movement disorders.
The results of karyotyping alone indicate that any data generated using SN4741 cells may be biologically inaccurate due to extreme variability in chromosome complement and therefore, copy number variation, between individual cells. Consequently, the results of previous studies evaluating neurotoxicity [21,23,24,26,27,30,32,38,67], cellular signaling pathways [22,25,28,68,69], and transcriptional pro ling [37] in these cells may have been unduly in uenced by the extreme imbalance in gene dosage that we found to vary from cell to cell. For example, alpha-synuclein (SNCA) has been consistently implicated in PD risk [70][71][72][73], particularly due to variants that promote α-synuclein misfolding [74] and overexpression [50] or events that result in gene ampli cation [75,76]. Snca is present on mouse chromosome 6 and the karyotypes generated for SN4741 cells show that chromosome 6 is triploid in most assayed cells (Fig. 1B). Therefore, using the SN4741 cell line to model neurodegeneration in PD may result in inaccurate data due to an exaggerated vulnerability towards degeneration imposed by elevated Snca copy number, by gene dosage effects of other interacting gene products in relevant pathways, or by the structural instability of this line.
Even if this cell line could be adopted to study Snca overexpression/ampli cation, ATAC-seq pro ling of open chromatin regions in this cell line at the permissive and non-permissive temperatures indicates that these cells do not possess chromatin accessibility pro les similar to those of ex vivo, mouse E15.5 MB neurons. In PD, disease is characterized by the degeneration of MB DA neurons, while DA neurons of the FB are spared. Therefore, the chromatin pro les of MB DA neurons, as well as the differentially accessible regions of the genome between MB and FB neurons, may in uence the preferential vulnerability of MB neurons in PD [50]. In the context of exploiting these chromatin pro les to study PD-associated variability and neurotoxicity, SN4741 cells are likely a poor model, as the open chromatin regions of these cells are not a reliable proxy for mouse E15.5 MB or FB DA neurons.
The chromatin accessibility pro les of SN4741 cells not only fail to cluster with ex vivo populations of mouse MB neurons, but the transcriptional landscapes of these cells suggest that these cells have shifted towards a more differentiated state that may be less DA than previously thought. Examination of cell cycle markers by scRNA-seq demonstrates that SN4741 cells at the non-permissive temperature are more differentiated than cells at the permissive temperature, as expected [20]. GO terms for genes that are signi cantly downregulated at the non-permissive temperature reinforces that these cells are no longer rapidly dividing, pluripotent stem cells. However, RT-qPCR, scRNA-seq, and bulk RNA-seq in these cells fail to detect signi cant upregulation of most key DA neuron markers in the differentiated cells, except for Th. Th is not exclusively expressed by DA neurons at embryonic timepoints [40][41][42]. In fact, signi cantly upregulated genes in SN4741 cells at the non-permissive temperature that overlap with GO terms and cell cycle marker genes suggests that Th is the only signi cantly upregulated gene overlapping with biological processes involving DA neurons. Rather, additional overlapping cell type marker genes suggest that these cells more closely resemble immature neurons.
In parallel, we generated promoter capture (pc)Hi-C data at the non-permissive temperature with the intention of exploring how non-coding disease-relevant variants interact with promoters and potentially regulate gene expression in MD DA neurons. As our group is focused on PD-associated variation, which is unlikely to act broadly in immature neurons, our group has not analyzed the resulting data, beyond basic quality control (Supplemental Fig. 4). While the SN4741 cells at the non-permissive temperature fail to recapitulate the transcriptomic or chromatin state of DA neurons, it is of potential interest for follow-up studies that they do resemble some immature neuron types. Although not analyzed by our group, we generated output les for interaction detection, and this data has been made publicly available for others to explore (accessible through: https://github.com/rachelboyd/SN4741_pcHiC), as it may be useful to study genomic interactions at promoters driving an immature neuronal state. However, the cell type best represented by SN4741 cells at the non-permissive temperature still requires deeper characterization.
Any data generated using SN4741 cells in the context of DA neuron modeling and/or PD must be interpreted with caution and in light of the appropriate caveats. Due to the instability and polyploidy of this cell line, we recommend that the use of SN4741 cells for PD-related research be re-evaluated. Future studies designed to ne-tune the classi cation of these cells may support the use of SN4741 cells as a model of other neuronal or non-neuronal cells. Additionally, the differentiation trajectory of these cells may be amenable to intervention(s) that could drive their molecular state towards one that resembles DA neurons more closely.

Conclusions
This study establishes a valuable precedent with broad implications across biological and diseaserelated research. Prior to using SN4741 cells to study non-coding regulatory variation in PD, we characterized this cell line to determine its suitability as a model of DA neurons in PD, and found that these cells are unstable, polyploid cells that do not demonstrate strong molecular characteristics of MB DA neurons. These cells express low levels of DA neuron markers, and chromatin landscapes in differentiated SN4741 cells scarcely overlap open chromatin regions in ex vivo mouse E15.5 midbrain neurons. We demonstrate the importance of genomic characterization of in vitro model systems prior to generating data and valuable resources that may be used to inform aspects of human disease. In future studies that utilize in vitro models of any human disease, due diligence to con rm their suitability as surrogates could save time, resources, and possibly lives, by avoiding misdirection and advancing successful therapeutic development.

Methods
No animals were directly used in this study. All assays were carried out using animal-derived cell lines or publicly available animal-derived cell data.
Cell Culture SN4741 cells were obtained from the Ernest Arenas group at the Karolinska Institutet. SN4741 cells were con rmed to be mycoplasma free using a MycoAlert® Mycoplasma Detection Assay (Lonza) and were maintained in high glucose Dulbecco's Modi ed Eagle Medium (DMEM; Gibco 1196502), supplemented with 1% penicillin-streptomycin and 10% fetal bovine serum (FBS) in a humidi ed 5% CO 2 incubator at 37°C. Cells at 80% con uence were passaged by trypsinization approximately every 2-3 days. To induce differentiation, 24 hours after the cells were passaged, media was replaced by DMEM supplemented with 1% penicillin-streptomycin and 0.5% FBS at 39°C. Cells were allowed to grow and differentiate in these conditions for 48 hours before harvesting for experimentation.

G-Band Karyotyping
At passage 21, undifferentiated SN4741 cells were sent to the WiCell Research Institute (Madison, Wisconsin), at 40-60% con uency, for chromosomal G-band analyses. Karyotyping was conducted on 20 metaphase spreads, at a band resolution of >230, according to the International System for Human Cytogenetic Nomenclature.
cDNA Synthesis and RT-qPCR for DA neuron markers RNA was extracted from both differentiated and undifferentiated cells by following the RNeasy Mini Kit (QIAGEN) protocol, as written. 1 μg of each RNA sample underwent rst-strand cDNA synthesis using the SuperScript III First-Strand Synthesis System for RT-PCR (Invitrogen) according to the oligo(dT) method. qPCR was performed with Power SYBR Green Master Mix (Applied Biosystems), using primers for β-actin (Actb), Foxa2, Nr4a2, Slc6a3, and Th (Table S1). Reactions were run in triplicate under default SYBR Green Standard cycle speci cations on the Viia7 Real-Time PCR System (Applied Biosystems). Normalized relative quanti cation and error propagation followed the data analysis and associated calculations proposed by Taylor et al. (2019) [77], with results normalized to Actb.

Single cell RNA-seq Library Preparation, Sequencing, and Alignment
Both differentiated (39°C) and undifferentiated (37°C) cells were trypsinized, and scRNA-seq libraries were generated following the Chromium 10X pipeline [78]. Four replicates at each temperature across >17,000 cells were assayed. Cell capture, cDNA generation, and library preparation were performed with the standard protocol for the Chromium Single Cell 3' V3 reagent kit. Libraries were quanti ed with the Qubit dsDNA High Sensitivity Assay (Invitrogen) in combination with the High Sensitivity DNA Assay (Agilent) on the Agilent 2100 Bioanalyzer. Single-cell RNA-sequencing libraries were pooled and sequenced on an Illumina NovaSeq 6000 (SP ow cells), using 2x50 bp reads per library, to a combined depth of 1.6 billion reads. The quality of sequencing was evaluated via FastQC. Paired-end reads were aligned to the mouse reference genome (mm10) using the CellRanger v3.0.1 pipeline. Unique molecular identi er (UMI) counts were quanti ed per gene per cell ("cellranger count") and aggregated ("cellranger aggr") across samples with no normalization.
Single Cell RNA-seq Analysis Using Seurat [79](v4.2.0), cells were ltered to remove stressed/dying cells (% of reads mapping to the mitochondria > 15%) and empty droplets or doublets (number of unique genes detected <200 or >6,000). Cells were scored for their stage in the cell cycle using "CellCycleScoring()" on cell cycle genes provided by Seurat ("cc.genes"). Cells were then normalized using "SCTransform" (vst. avor = "v2") and corrected for percent mitochondrial reads and sequence depth. Principal component (PC) analysis was performed and a PC cut-off was identi ed using "ElbowPlot()." Using this PC cutoff and a minimum distance of 0.001, UMAP clustering was used for dimensionality reduction. Expression was plotted on a log scale with "VlnPlot()" for a variety of proliferation and DA neuron markers.

ATAC-seq Library Preparation and Quanti cation
ATAC-seq libraries were generated for four replicates of undifferentiated (37°C) and differentiated (39°C) SN4741 cells, according to the Omni-ATAC protocol[80], with minor modi cations. Aliquots of 50,000 cells were centrifuged at 2000 x g for 20 mins at 4°C, and the resulting pellets were resuspended in 50μL of resuspension buffer. Cells were left to lyse for 3 minutes on ice before being centrifuged again at 2000 x g for 20 minutes at 4°C. The resulting nuclei pellets were then tagmented, as written, using 50μL of transposition mixture and then incubated at 37°C for 30 mins in a 1000 RPM thermomixer. After transposition, DNA was puri ed with the Zymo DNA Clean and Concentrator -5 Kit and eluted in 21μL of elution buffer.
Pre-ampli cation of the transposed fragments was performed according to the conditions outlined in the Omni-ATAC protocol[80]; however, 12 pre-ampli cation cycles were run in lieu of qPCR ampli cation to determine additional cycles. The ampli ed libraries were prepared according to the Nextera DNA Library Prep Protocol Guide, except that libraries were puri ed with 40.5μL AMPure XP beads (Beckman Coulter), and 27.5μL of resuspension buffer was added to each sample. All libraries were quanti ed with the Qubit dsDNA High Sensitivity Assay (Invitrogen) in combination with the High Sensitivity DNA Assay (Agilent) on the Agilent 2100 Bioanalyzer.

ATAC-seq Sequencing, Alignment, and Peak Calling
Libraries were sequenced on Illumina NovaSeq 6000 (SP ow cells), using 2x50 bp reads per library, to a total combined depth of 1.6 billion reads. The quality of sequencing was evaluated with FastQC (v0. 11 [91] and the genomic distribution of peaks were calculated using the Cis-regulatory Element Annotation System (CEAS) [92] and conservation tool of the Cistrome [93] pipeline. Analysis can be found at http://cistrome.org/ap/u/smcclymont/h/sn4741-atac-seq-ceas-andconservation. Each sample's peak le and BAMs were read into and analysed with DiffBind (v3.8.1) [48]. Peaks present in two or more libraries were considered in the consensus peakset. Reads overlapping these consensus peaks were counted with specifying summits = 100, bRemoveDuplicates = TRUE. These read counts were normalized with on the full library size using the RLE normalization method as it is native to the DESeq2 analysis we employed in the following step. The volcano plot was generated using a custom script using the output of the command, where th = 1 and fold = 0 and bCounts = T, to output all peaks regardless of their foldchange or signi cance. Signi cantly differentially accessible regions ( ltered for abs(Fold)>1 & FDR < 0.05) were submitted to GREAT (v4.0.4) [94] and the gene ontology of the nearest gene, as identi ed with the basal+extension method (where proximal was considered to be ±5kb) was assessed and plotted.

ATAC-seq comparison to ex vivo MB and FB DA neurons
Previously generated ATAC-seq libraries from ex vivo E15.5 mouse embryonic DA neurons [50] were reanalyzed in parallel, following all the above alignment and ltering steps. DiffBind (v3.8.1) was used to compare the samples, as above, and the R package UpSetR (v1.4.0) [95] was used to plot the overlap of peaks between conditions. Bulk RNA-seq Library Preparation, Sequencing, and Alignment Cells were run through QIAshredder (Qiagen) and total RNA was extracted using the RNeasy Mini Kit (Qiagen) according to the manufacturer's recommendations, except that RNA was eluted twice in 50μL of water. Total RNA integrity was determined with the RNA Pico Kit (Agilent) on the Agilent 2100 Bioanalyzer.
Genes with an average of at least 1 read for each sample were analyzed to identify differentially expressed (DE) genes between temperature conditions, using the function "DESeq." P-value distribution after differential expression (DE) analysis (Supplemental Figure 3D) veri ed that the majority of called DE genes are signi cant. Results (alpha = 0.01) were generated and subjected to log fold change shrinkage using the function "lfcShrink" (type = "apeglm") [101] for subsequent visualization and ranking. The function "plotMA" was used to generate MA plots, both before and after LFC shrinkage, to visualize the log 2 fold changes attributable to the non-permissive temperature shift over the mean of normalized counts for all the samples in the DESeqDataSet (Supplemental Figure 3E-F). MA plots demonstrated that log fold change shrinkage of the data successfully diminished the effect size of lowly expressed transcripts with relatively high levels of variability.
Volcano plots were generated using a custom function to visualize log 2 fold changes of speci c genes in the dataset. A gene was considered signi cantly differentially expressed if it demonstrated an adjusted pvalue < 0.01 and |log 2 FC| > 1.5. These signi cantly differentially expressed genes were submitted to Enrichr [51][52][53] for analyses within the "ontologies" and "cell types" categories. The upregulated and downregulated gene sets were passed to STRING [66] for analysis of protein-protein interactions and network relationships.
Bulk RNA-seq comparison to ex vivo MB and FB DA neurons Read counts from the SN4741 bulk RNA-seq dataset were converted to RPKM and compared to bulk RNAseq data from previously generated ex vivo mouse embryonic DA neurons (NCBI GEO: GSE122450; [50]). A Pearson correlation heatmap was generated using ggplot2 [102].
Promoter capture HiC library generation PcHiC was performed as previously described [103], with minor modi cations. Brie y, SN4741 cells were cultured at the non-permissive temperature and plated at ve million cells per 10cm dish. The cells were crosslinked using 1% formaldehyde, snap frozen using liquid nitrogen, and stored at -80°C. The cells were dounce homogenized and restriction enzyme digestion, using 400 units HindIII-HF overnight at 37°C. The total volume was maintained at 500µL, through addition of 1X NEBuffer 2.1. Heat inactivation was performed at 80°C for 20 minutes, and biotinylated-dCTP was used for biotin ll-in reaction. Blunt-end ligation was performed using Thermo T4 DNA ligase, with cohesive end units maintained at 15,000 and buffer and water volumes adjusted to ensure a total volume of 665µL was added to each Hi-C tube.
Cross-linking was performed overnight, with additional (50μL) proteinase K added for two hours the following day. DNA puri cation was split across two reactions using 2mL PhaseLock tubes, and volumes were adjusted accordingly. Each PhaseLock reaction was split again into two vials for ethanol puri cation, and centrifugation at step 6.3.8 was performed at room temperature. The pellets were dissolved in 450µL 1X TLE and transferred to a 0.5mL 30kD Amicon Column. After washing, the column was inverted into a new container, and no additional liquid was added to raise the volume to 100µL. All four reactions were combined, the total volume determined, and RNAseA (1mg/mL) equal to 1% of the total volume was added for 30 minutes at 37°C.
The libraries were assessed for successful blunt-end ligation by a ClaI restriction enzyme digest of PCR products, as previously described [103]. Biotin was removed from un-ligated ends and DNA was sheared to a size of 200-300bp using the Covaris M220 (High setting, 35 cycles of 30s "on" and 90s "off"; vortexing/spinning down samples and changing sonicator water every 5 cycles). Size selection was performed using AMpure XP magnetic beads, as previously described [103] except that all resuspension steps were increased by 5µL, so that 5μL could be used for QC with the High Sensitivity DNA Assay (Agilent) on the Agilent 2100 Bioanalyzer (at three stages: post-sonication, post-0.8x size-selection, and post-1.1x size-selection). The remaining protocol was performed as described. Capture probes (Arbor Biosciences; https://github.com/nbbarrientos/SN4741_pcHiC) were designed against mouse (mm10) RefSeq transcription start sites, ltering out "XM" and "XR" annotated genes. The remaining promoters were intersected with the in silico digested HindIII mouse genome, to retain all HindIII fragments containing a promoter. Potential probes sites were assessed ±330bp of the HindIII cut site on either end of the fragment and nalized probe sets were ltered using no repeats and "strict" criteria, as de ned by Arbor Biosciences. (v.3.5.1) before read coverage similarities and replicate correlation was assessed using the function "multiBamSummary" (in bins mode) to analyze the entire genome. A Pearson correlation heatmap was generated using the function "plotCorrelation" (Supplemental Figure 4D)  by RT-qPCR indicates that Foxa2, Nr4a2, and Slc6a3 remain at similar expression levels, but Th expression increases, when SN4741 cells are shifted from the permissive temperature (37°C) to the non-permissive temperature (39°C). D) UMAP plot of scRNA-seq at the permissive and non-permissive temperatures indicates that cells at each temperature are transcriptionally distinct. E) Analysis of scRNAseq data demonstrates that shifting the cells to the non-permissive temperature is accompanied by a shift in cell cycle stage from G2M and S phases to primarily G1 phase. F) Violin plots generated with scRNA-seq data show that Mki67, a marker of cellular proliferation, and Nes, a neural stem cell marker, are both expressed at the permissive temperature (37°C), with little to no expression at the non-permissive temperature (39°C). G) Violin plots generated with scRNA-seq data show that transcripts associated with immature neurons are upregulated when SN4741 cells are shifted to the non-permissive temperature. H) Violin plots generated with scRNA-seq data show that expression of DA neural markers, Aldh1a1, Foxa2, Lmx1b, Nr4a2, Pitx3, Slc6a3, and Th, remain at similar levels when SN4741 cells are shifted to the nonpermissive temperature. negative regulation of differentiation, as is appropriate for this temperature. G) Gene ontology of genes adjacent regions that are preferentially open at the non-permissive temperature are associated with a variety of differentiation fates (blood vessels, muscle cells, cartilage/chondrocytes). Additionally, two of the top gene ontology terms relate to the p38 MAPK cascade, which has been found to be activated as a cellular response to heat stress.   showing normalized bulk RNA-seq read counts from genes upregulated at 39°C that overlap the "immature neurons" predicted cell type. B) A bar chart showing normalized bulk RNA-seq read counts from genes upregulated at 39°C that overlap the "oligodendrocyte" predicted cell type. C) A bar chart showing normalized bulk RNA-seq read counts from genes downregulated at 39°C that overlap the